Semi-supervised Regression with Order Preferences

نویسندگان

  • Xiaojin Zhu
  • Andrew B. Goldberg
چکیده

Following a discussion on the general form of regularization for semi-supervised learning, we propose a semi-supervised regression algorithm. It is based on the assumption that we have certain order preferences on unlabeled data (e.g., point x1 has a larger target value than x2). Semi-supervised learning consists of enforcing the order preferences as regularization in a risk minimization framework. The optimization problem can be effectively solved by a linear program. Experiments show that the proposed semi-supervised regression outperforms standard regression. 1 Semi-supervised learning as regularization on unlabeled data Semi-supervised learning works when its assumption on unlabeled data, often expressed as regularization, fits the reality of the problem domain. In this paper we first generalize the regularization formulation of some common semi-supervised learning approaches, namely manifold regularization, semi-supervised support vector machines, and multi-view learning [1, 2, 3]. Regularization for each individual approach is not new. However these approaches have been studied largely in isolation. Our general form serves as a bridge to connect them, and to inspire novel semi-supervised approaches. As an example of the latter, we propose a novel algorithm for semi-supervised regression. The proposed regression algorithm is able to incorporate domain knowledge about the relative order of target values on unlabeled points. It thus differs from, and complements, existing semi-supervised regression methods, which do not use such domain knowledge but require multiple views [4, 5]. Let us review the three common semi-supervised learning methods. Manifold regularization [6, 7] generalizes several graph-based semi-supervised learning methods. Let l be the number of labeled points, u the number of unlabeled points. Graph-based semi-supervised learning requires a weighted, undirected graph, characterized by an (l + u) × (l + u) weight matrix W defined on labeled and unlabeled data. It is assumed that from the features of two points xi, xj (e.g., by computing their Euclidean distance), domain experts can assign a non-negative weight wij . A large wij implies a preference for f(xi), f(xj) to be similar. Therefore subgraphs with large weights tend to have the same label. This is sometimes called the cluster assumption. Let K be a kernel and H the corresponding Reproducing Kernel Hilbert Space (RKHS). Let y be the labels, which can be categories for classification or real numbers for regression. Manifold regularization seeks a prediction function f ∈ H, such that f is the solution to

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Kernel Regression with Order Preferences

We propose a novel kernel regression algorithm which takes into account order preferences on unlabeled data. Such preferences have the form that point x1 has a larger target value than that of x2, although the target values for x1, x2 are unknown. The order preferences can be viewed as side information or a form of weak labels, and our algorithm can be related to semi-supervised learning. Learn...

متن کامل

An Effective Semi-Supervised Clustering Framework Integrating Pairwise Constraints and Attribute Preferences

Both the instance level knowledge and the attribute level knowledge can improve clustering quality, but how to effectively utilize both of them is an essential problem to solve. This paper proposes a wrapper framework for semi-supervised clustering, which aims to gracely integrate both kinds of priori knowledge in the 598 J. L. Wang, S.Y. Wu, C. Wen, G. Li clustering process, the instance level...

متن کامل

Semi-supervised Regression using Hessian energy with an application to semi-supervised dimensionality reduction

Semi-supervised regression based on the graph Laplacian suffers from the fact that the solution is biased towards a constant and the lack of extrapolating power. Based on these observations, we propose to use the second-order Hessian energy for semi-supervised regression which overcomes both these problems. If the data lies on or close to a low-dimensional submanifold in feature space, the Hess...

متن کامل

Input Output Kernel Regression: Supervised and Semi-Supervised Structured Output Prediction with Operator-Valued Kernels

In this paper, we introduce a novel approach, called Input Output Kernel Regression (IOKR), for learning mappings between structured inputs and structured outputs. The approach belongs to the family of Output Kernel Regression methods devoted to regression in feature space endowed with some output kernel. In order to take into account structure in input data and benefit from kernels in the inpu...

متن کامل

Self-Train LogitBoost for Semi-supervised Learning

Semi-supervised classification methods are based on the use of unlabeled data in combination with a smaller set of labeled examples, in order to increase the classification rate compared with the supervised methods, in which the total training is executed only by the usage of labeled data. In this work, a self-train Logitboost algorithm is presented. The self-train process improves the results ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006